Fwd: Apple Darwin disabled fsync?

Поиск
Список
Период
Сортировка
От Peter Bierman
Тема Fwd: Apple Darwin disabled fsync?
Дата
Msg-id a06010200be3da9564694@[17.202.21.231]
обсуждение исходный текст
Ответы Re: Fwd: Apple Darwin disabled fsync?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Fwd: Apple Darwin disabled fsync?  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
>Date: Sat, 19 Feb 2005 17:59:21 -0800
>From: Dominic Giampaolo <dbg@apple.com>
>Subject: Re: bad fsync? (A.M.)
>To: darwin-dev@lists.apple.com
>
>>MySQL makes the following claim at:
>>http://dev.mysql.com/doc/mysql/en/news-4-1-9.html
>>
>>"InnoDB: Use the fcntl() file flush method on Mac OS X versions 10.3
>>and up. Apple had disabled fsync() in Mac OS X for internal disk
>>drives, which caused corruption at power outages."
>>
>>First of all, is this accurate? A pointer to some docs or a tech note
>>on this would be helpful.
>>
>The comments about fsync() are wrong...
>
>On MacOS X, fsync() always has and always will flush all file data
>from host memory to the drive on which the file resides.  The behavior
>of fsync() on MacOS X is the same as it is on every other version of
>Unix since the dawn of time (well, since the introduction of fsync
>anyway :-).
>
>I believe that what the above comment refers to is the fact that
>fsync() is not sufficient to guarantee that your data is on stable
>storage and on MacOS X we provide a fcntl(), called F_FULLFSYNC,
>to ask the drive to flush all buffered data to stable storage.
>
>Let me explain in more detail.  With fsync() even though the OS
>writes the data through to the disk and the disk says "yes I wrote
>the data", the data is not actually on permanent storage.  Unless
>you explicitly disable it, all disks have a write buffer which holds
>data you've written.  The disk buffers the data you wrote until it
>decides to flush it to the platters (and the writes may not be in
>the order you wrote them).  If you lose power or the system crashes
>before the data is written, you can wind up in a situation where only
>some of your data is actually on disk.  What is worse is that even if
>you write blocks A, B and C, call fsync() and then write block D you
>may find after rebooting that blocks A and D are on disk but B and C
>are not (in fact any ordering of A, B, C, and D is possible).
>
>While this may seem like a rare case it is not.  In fact if you sit
>down and pull the plug on a system you can make it happen in one or
>two plug pulls.  I have even gone so far as to watch this behavior
>with a logic analyzer on the ATA bus: I saw the data for two writes
>come across the ATA cable, the drive replied and said the writes were
>successful and then when we rebooted the data from the second write
>was correct on disk but the data from the first write was not.
>
>To deal with this we introduced the F_FULLFSYNC fcntl which will ask
>the drive to flush all of its buffered data to disk.  When an app
>needs to guarantee that data is on disk it should use F_FULLFSYNC.
>In most cases you do not need such a heavy handed operation and
>fsync() is good enough.  But in an app like a database, it is
>essential if you want transactional integrity.
>
>Now, a little bit more detail: on ATA drives we implement F_FULLFSYNC
>with the FLUSH_TRACK_CACHE command.  All drives sold by Apple will
>honor this command.  Unfortunately quite a few firewire drive vendors
>disable this command and do not pass it to the drive.  This means that
>most external firewire drives are not reliable if you lose power or
>the system crashes.  We can't work-around that unless we ask the drive
>to disable the write cache completely (which hurts performance quite
>badly -- and even that may not be enough as some drives will ignore
>that request too).
>
>So in summary, I believe that the comments in the MySQL news posting
>are slightly confused.  On MacOS X fsync() behaves the same as it does
>on all Unices.  That's not good enough if you really care about data
>integrity and so we also provide the F_FULLFSYNC fcntl.  As far as I
>know, MacOS X is the only OS to provide this feature for apps that
>need to truly guarantee their data is on disk.
>
>Hope this clears things up.
>
>--dominic


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Kirkwood
Дата:
Сообщение: Re: Data loss, vacuum, transaction wrap-around
Следующее
От: Robert Treat
Дата:
Сообщение: Re: Get rid of system attributes in pg_attribute?